Affiliations
doi: 10.29271/jcpsppg.2025.01.04ABSTRACT
Objective: To evaluate the accuracy and reliability of the Artificial Intelligence (AI)-assisted WebCeph application for lateral cephalometric analysis, compared with the manual tracing technique, based on 12 parameters of Steiner’s cephalometric analysis.
Study Design: Descriptive, cross-sectional study.
Place and Duration of the Study: Department of Orthodontics, KRL Hospital, Islamabad, Pakistan, between June and November 2024.
Methodology: The study was performed on 30 pre-treatment lateral cephalometric radiographs. Each radiograph was analysed via two techniques: The current gold standard, i.e. conventional manual cephalometric approach and the AI-assisted WebCeph technique. Steiner’s linear and angular measurements were obtained. SPSS version 25 was used for data analysis. The interclass correlation coefficient (ICC) was measured between the digital and conventional methods to determine accuracy. An ICC value below 0.75 indicated poor-to-moderate agreement. ICC value within the range of 0.75-0.90 indicated good agreement, while values >0.90 indicated excellent agreement. Intra- operator reliability was determined using a paired t-test. A p-value of p <0.05 was considered as statistically significant. Normality of all the data was assessed using the Shapiro-Wilk test.
Results: All measurements, except SN-OP (°), showed ICC values above 0.75. An ICC value >0.90 was recorded for five parameters (SNB, ANB, SN-Go-Gn (°), UL to S-line, and LL to S-line (mm)). Six out of 12 parameters (SNA, U1-NA, L1-NB, Interincisal angle (°), U1-NA, and L1-NB (mm)) obtained ICC values between 0.75-0.90. On repeated measurements, no statistically significant difference was observed as the p-value was >0.05 for all parameters in both the conventional and WebCeph groups, indicating good reliability.
Conclusion: The WebCeph showed performance at par with the human gold standard, with excellent to good agreement for the majority of the assessed variables in terms of accuracy, as well as acceptable intra-examiner reliability.
Key Words: Artificial Intelligence, Cephalometric analysis, Orthodontics, WebCeph, Human gold standard.
INTRODUCTION
Lateral cephalometric radiographs are a valuable component of the standardised records in orthodontic diagnosis and decision-making.1 Cephalometric analysis involves tracings and measurements performed on cephalometric radiographs. The current gold standard involving manual tracing of anatomical cephalometric landmarks on acetate sheets is tedious and time-consuming.
Artificial intelligence (AI) refers to the simulation of human intelligence through complex computerised programmes inspired by the biological nervous system.
The introduction and application of AI have provided powerful tools that can aid orthodontists in diagnosis and decision- making.2-4
Systematic reviews suggest good reliability and accuracy of various AI-based cephalometric applications.3,5 One such tool is WebCephTM (AssembleCircle Corporation, Republic of Korea), a web-based application that involves AI-assisted predictions of cephalometric landmarks and the subsequent automated analysis to provide diagnostic information.6 The WebCeph is an AI-supported web-based orthodontic programme, with numerous valuable features including automated cephalometric landmark identification and analysis, surgical simulations, computerised superimposition, case review, case rooms, and digital storage of records, among others. The software also allows for manual revision of the cephalometric landmarks.7 The digital platform provides free service for cephalometric landmark detection and analysis, thereby eliminating purchasing cost and expediting the analysis process; hence, the need for performance evaluation is important.
Present data show consistent and highly accurate results for automated cephalometric landmark detection, but the evidence is prone to bias.8,9 Although the AI-based softwares are gaining rapid popularity over time, evidence regarding their performance in terms of accuracy and reliability is inconclusive. Considering the varying results, the present study was undertaken.
This study is aimed at evaluating the accuracy and reliability of the WebCeph cephalometric analysis. Such AI-based tools can save time and excessive effort, thereby enhancing clinical productivity.
METHODOLOGY
A descriptive, cross-sectional study was conducted at the Department of Orthodontics, KRL Hospital, Islamabad, Pakistan. Non-probability, consecutive sampling technique was employed. Sample size was 30, which was calculated by using the correlation sample size calculator; significance level was 5%, power of test was 80%, and correlation coefficient was r = 0.5.10 Sample size was 30. Both male and female patients, between 12 and 35 years of age, reporting to the orthodontic clinic, were included in the study. Standardised, good-quality radiographs were selected. All the selected radiographs were captured by the same operator, using the same equipment. Patients with gross asymmetry, craniofacial deformity and syndromes, unerupted or missing permanent incisors and molars, impacted teeth, and those who underwent prior orthodontic treatment were excluded.
This study received ethical approval from the Ethical Review Committee of the KRL Hospital, Islamabad, Pakistan. The study was conducted over a duration of six months between June and November 2024. Fifty lateral cephalometric radiographs that matched the criteria were chosen. Numbers were assigned from one to 50. Thirty out of the 50 radiographs were randomly selected using the random.org, a randomisation utility.
Each radiograph was analysed via two techniques: The conventional manual cephalometric technique and the digital AI-assisted WebCeph technique. Hand tracings were carried out on transparent acetate sheets on an illuminated view box using a lead pencil. Cephalometric landmarks and planes were marked. Bilateral structures were averaged and presented as a single landmark (Figure 1).11 Steiner’s cephalometric analysis measurements, eight angular and four linear (Table I), were carried out manually and recorded for statistical evaluation. WebCeph analysis was carried out by importing large-resolution JPG versions of all cephalograms, provided by the radiographic imaging services, to the WebCeph web application. Angular and linear measurements of Steiner’s analysis were obtained and recorded. To check for intra-operator reliability, 10 out of the 30 radiographs were randomly selected and re- evaluated at a 4-week interval using both the digital and the conventional method.
SPSS Statistics (IBM Corporation, USA) version 25.0 was used for statistical analysis. Descriptive statistics were measured for qualitative and quantitative parameters. Quantitative parameters i.e., age, angular measurements (SNA, SNB, ANB, SN-Go-Gn, U1-NA, L1-NB, SN-OP, and Interincisal angle), and linear measurements (U1-NA, L1-NB, UL to S line, and LL to S line) were measured in terms of mean and standard deviation (SD).
Interclass correlation coefficient (ICC) was measured be- tween the digital and the conventional method to determine accuracy. ICC value below 0.75 indicated poor or moderate agreement. ICC value within the range of 0.75-0.90 indicated good agreement, while values greater than 0.90 indicated excellent or high-degree of measurement agreement. Normality of all the data was assessed using the Shapiro-Wilk test, and the parametric test was selected. Intra- operator reliability, at 4-week interval, was determined using the paired t-test. A p-value was generated and compared. Statistical significance was set at p <0.05.12
RESULTS
ICC for comparison between the manual and the AI-based WebCeph method exhibited the following results: All measurements, except SN-OP (°), showed ICC values >0.75, denoting good agreement in terms of accuracy (Table II). A higher ICC value >0.9, i.e. excellent agreement, was obtained for five parameters, i.e. SNB, ANB, SN-Go-Gn (°), UL to S-line, and LL to S-line (mm), while six of the 12 parameters, i.e. SNA, U1-NA, L1-NB, interincisal angle (°) U1-NA, and L1-NB (mm), obtained ICC values between 0.75 and 0.90.
Figure 1: Cephalometric landmarks and planes. (1) Sella (S), (2) Nasion (N), (3) Porion (Po), (4) Orbitale (Or), (5) Posterior nasal spine (PNS), (6) Anterior nasal spine (ANS), (7) A point, (8) B point, (9) Pogonion (Pog), (10) Gnathion (Gn), (11) Menton (Me), (12) Gonion (Go), (13) S point (Steiner analysis), (14) Labial superius (LS), (15) Labial inferius (LI), and (16) Soft tissue pogion (Pog’).
Table I: Cephalometric measurements.
Angular parameters (°) |
|
SNA |
Anteroposterior position of the maxilla relative to the anterior cranial base |
SNB |
Anteroposterior position of the mandible relative to the anterior cranial base |
ANB |
The difference between SNA and SNB angles defines the mutual relationship in the sagittal plane of the maxillary and mandibular bases |
U1-NA |
Angle between the nasion-A point (NA) line and the long axis of the upper incisor |
L1-NB |
Angle between the nasion-B point (NB) line and the long axis of the lower incisor |
SN-Go-Gn |
Angle between SN plane and the mandibular plane (Go-Gn) |
SN-OP |
Angle between the SN plane and the occlusal plane |
Interincisal angle |
The angle between the axis of the upper incisor and the axis of the lower incisor |
Linear parameters (mm) |
|
U1-NA |
Linear measurement from the tip of the upper central incisor to the NA line |
L1-NB |
Linear measurement from the tip of lower central incisor to NB line |
UL to S line |
Linear measurement from the most prominent point of the upper lip to Steiner’s S line |
LL to S line |
Linear measurement from the most prominent point of the lower lip to Steiner’s S line |
Table II: Comparison between the conventional and the digital WebCeph methods.
Parameters |
Conventional vs. WebCeph |
|
ICCa |
95% Clb |
|
Angular parameters (°) |
||
SNA |
0.814 |
0.614-0.911 |
SNB |
0.900 |
0.791-0.952 |
ANB |
0.906 |
0.701-0.963 |
U1-NA |
0.899 |
0.433-0.967 |
L1-NB |
0.821 |
-0.044-0.946 |
SN-Go-Gn |
0.940 |
0.870-0.972 |
SN-OP |
0.672 |
0.214-0.854 |
Interincisal angle |
0.887 |
0.124-0.967 |
Linear parameters (mm) |
||
U1-NA |
0.856 |
0.605-0.939 |
L1-NB |
0.885 |
0.703-0.950 |
UL to S line |
0.910 |
0.790-0.959 |
LL to S line |
0.917 |
0.825-0.960 |
a ICC, interclass correlation coefficient (>0.9 excellent; >0.75 - 0.90 good; <0.75 poor to moderate). b CI, confidence interval. |
Table III: Mean differences, standard deviation, and correlation coefficient (intra-examiner error) for repeated measurements of digital and conventional tracings.
Cephalometric measurements |
Conventional method |
Digital WebCeph method |
||
Difference (Mean ± SDa) |
Paired t-test p-values |
Difference (Mean ± SDa) |
Paired t-test p-values |
|
Angular parameters (°) |
||||
SNA |
-0.10 ± 1.45 |
0.832 |
-0.05 ± 0.67 |
0.819 |
SNB |
0.10 ± 0.99 |
0.758 |
0.06 ± 0.51 |
0.718 |
ANB |
-0.20 ± 1.23 |
0.619 |
-0.15 ± 0.41 |
0.279 |
U1-NA |
0.40 ± 2.07 |
0.555 |
-0.30 ± 1.06 |
0.394 |
L1-NB |
0.70 ± 3.97 |
0.591 |
-0.15 ± 0.94 |
0.627 |
SN-Go-Gn |
-0.30 ± 1.34 |
0.496 |
-0.01 ± 0.03 |
0.343 |
SN-OP |
-0.40 ± 1.08 |
0.269 |
-0.01 ± 1.04 |
0.976 |
Interincisal angle |
-0.90 ± 4.53 |
0.546 |
0.41 ± 0.58 |
0.053 |
Linear parameters (mm) |
||||
U1-NA |
0.10 ± 1.17 |
0.794 |
0.16 ± 0.45 |
0.293 |
L1-NB |
-0.20 ± 0.54 |
0.269 |
-0.11 ± 0.23 |
0.170 |
UL to S line |
0.25 ± 0.63 |
0.244 |
0.09 ± 0.33 |
0.436 |
LL to S line |
0.35 ± 0.71 |
0.153 |
0.08 ± 0.51 |
0.662 |
aSD, standard deviation; (p >0.05, not significant). |
Paired t-test for intra-examiner error exhibited no statistically significant difference (p >0.05, Table III) in both the conventional and the WebCeph groups, indicating good reliability. The largest differences noted in consecutive tracing trials were 0.30° and 0.41° for the digital WebCeph technique and 0.90° and 0.70° for the conventional approach.
DISCUSSION
With current advancements in AI technology, great achievements in the orthodontic domain are anticipated. While tracing accuracy and reliability can be a limiting factor in conventional cephalometry,13 studies indicate that AI-based applications show landmark detection at par with human experts,14 and greater reliability than conventional, i.e. always detected identical landmark positions upon repeated trials.15,16 Recent studies on WebCeph also show acceptable intra-observer reliability.12,17,18 Results from the present study exhibited no statistically significant difference (p >0.05) between the digital and the conventional groups, indicating good reliability.
While some studies including the present study, evaluating accuracy of the WebCeph in comparison with the traditional tracing method show acceptable results,12,18 suggesting that the WebCeph can be an aid to the orthodontists, literature showing contradictory conclusions exists. Comparing the findings of the present study to similar studies aimed at assessing the accuracy of the fully-automated WebCeph software, some differences were observed. A recent study by Baig et al. showed significant inaccuracies and a lack of reliability in AI-based fully-automated lateral cephalometric analysis using the WebCeph software, in comparison with the gold-standard hand-tracing approach. Statistically significant differences were obtained for 10 out of the 11 measurements.19 Similar results were noted by other studies, although the results are promising for the identification of certain points.20
Kunz et al. in their study comparing the WebCeph with the human gold standard, showed no significant mean difference in any of the nine examined measurements. However, WebCeph exhibited a high possibility of proportional bias. Accuracy was not clinically acceptable for the WebCeph dental analysis.21
Comparing the WebCeph with the semi-automated AutoCAD software, i.e. manual landmark identification, followed by automated angular and linear calculations, Yassir et al. in their study showed similar findings, with poor landmark detection and inconsistent results with the automated WebCeph. Authors, therefore, suggest caution when using the software for cephalometric analysis, with supervision by an experienced clinician.22
Similarly, in another study, WebCeph showed significant differences (p <0.05) in landmarks recognised by the digital application. Human experts showed excellent reproducibility (ICC ≥0.9943), whereas the WebCeph showed good reproducibility with ICC ≥0.7868.23 The authors concluded that the WebCeph produced significant errors, with inconsistent and incorrect landmark identification.
Another recent study evaluating the accuracy of the fully- automated WebCeph and OrthoDx softwares vs. non-automated manual landmark marking via the Dolphin software showed statistically significant favourable results for the angular parameters. Linear parameters and soft tissue measurements showed weak correlation. Therefore, manual intervention is required in order to minimise errors when using AI-assisted fully-automated software for cephalometric evaluation.24 The present study showed excellent-to-good agreement for all angular and linear measurements, except the SN-OP (°), which produced an ICC value of 0.672, indicating poor-to-moderate agreement.
Advances in AI technology are rapid, but AI models and algorithms require further refinement and testing. Although findings from the present study indicate good agreement between the WebCeph technique and the manual cephalometric tracing method, at present, digital technology cannot completely overtake or replace the orthodontist's role in cephalometric diagnosis and clinical decision-making. Systematic reviews and meta-analyses on AI-assisted cephalometric landmark detection propose further research due to high risk of bias in the existing literature.9,25 A recent umbrella review illustrated erroneous automated cephalometric landmark detection with limited accuracy, suggesting verification from a trained orthodontist.25
A key limitation of this study is that the landmark detection and evaluation by human expert was done by one examiner only. Despite sufficient clinical experience, assessment by human experts can be susceptible to errors. Therefore, for a more accurate gold standard assessment, a mean value for each parameter examined by more than two orthodontists could be obtained.
CONCLUSION
Accuracy of the AI-assisted WebCeph cephalometric analysis is at par with the human gold standard. Excellent agreement was obtained for five of the 12 cephalometric parameters. Six of the 12 parameters indicated good agreement. In terms of intra- examiner reliability, both the WebCeph and the human gold standard showed acceptable results at detecting identical landmark positions upon repeated trials.
ETHICAL APPROVAL:
This study received ethical approval from the Ethical Review Committee of the KRL Hospital, Islamabad, Pakistan (Ref. No: KRL-HI-ERC-May21/25).
PATIENTS’ CONSENT:
Informed consent was obtained from all participants included in the study.
COMPETING INTEREST:
The authors declared no conflict of interest.
AUTHORS’ CONTRIBUTION:
MAW: Study design, data collection, analysis, and manuscript writing.
AMA: Study design and critical review.
Both authors approved the final version of the manuscript to be published.
REFERENCES